Classifying Unseen Cases with Many Missing Values
نویسندگان
چکیده
Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classiiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robust-ness of four recently developed committee learning techniques, including Boosting, Bagging, Sasc, and SascMB, relative to C4.5 for tolerating missing values in test data. Boosting is found to have a similar level of robustness to C4.5 for tolerating missing values in test data in terms of average error in a representative collection of natural domains under investigation. Bagging performs slightly better than Boosting, while Sasc and SascMB perform better than them in this regard, with SascMB performing best. Furthermore, we propose a novel voting weight scheme for the committee learning techniques. Although it is very simple, it can improve the robustness of all these four committee learning techniques for tolerating missing values in test data, especially when many missing values exist.
منابع مشابه
Classifying Unseen Cases with Many
Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classi-ers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosti...
متن کاملUsing Association Rules to Make Rule-based Classifiers Robust
Rule-based classification systems have been widely used in real world applications because of the easy interpretability of rules. Many traditional rule-based classifiers prefer small rule sets to large rule sets, but small classifiers are sensitive to the missing values in unseen test data. In this paper, we present a larger classifier that is less sensitive to the missing values in unseen test...
متن کاملA Critique of the View Claiming Conflict in the Verses of the Knowledge of the Unseen
The claim of conflict in the verses of the knowledge of the unseen in Quran is one of those made by Brasher – the Jewish orientalist. He believes that the verses which consider the knowledge of the unseen to be only specific to God are in conflict with those verses referring apparently to the Prophet (p.b.u.h) and some of the divine selected people's awareness of the unseen. Classifying the ver...
متن کاملGeneralization to Unseen Cases
We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our resu...
متن کاملReplace Missing Values with EM algorithm based on GMM and Naïve Bayesian
In data mining applications, there are various kinds of missing values in experimental datasets. Non-substitution or inappropriate treatment of missing values has a high probability to cause a lot of warnings or errors. Besides, many classification algorithms are very sensitive to the missing values. Because of these, handling the missing values is an important phase in many classification or d...
متن کامل